Greedy Adaptive Critics for LQR Problems: Convergence Proofs
نویسندگان
چکیده
A number of success stories have been told where reinforcement learning has been applied to problems in continuous state spaces using neural nets or other sorts of function approximators in the adaptive critics. However, the theoretical understanding of why and when these algorithms work is inadequate. This is clearly exempliied by the lack of convergence results for a number of important situations. To our knowledge only two such results been presented for systems in the continuous state space domain. The rst is due to Werbos and is concerned with linear function approximation and heuristic dynamic programming. Here no optimal strategy can be found why the result is of limited importance. The second result is due to Bradtke and deals with linear quadratic systems and quadratic function approximators. Bradtke's proof is limited to ADHDP and policy iteration techniques where the optimal solution is found by a number of successive approximations. This paper deals with greedy techniques, where the optimal solution is directly aimed for. Convergence proofs for a number of adaptive critics, HDP, DHP, ADHDP and ADDHP, are presented. Optimal controllers for linear quadratic regulation (LQR) systems can be found by standard techniques from control theory but the assumptions made in control theory can be weakened if adaptive critic techniques are employed. The main point of this paper is, however, not to emphasize the diierences but to highlight the similarities and by so doing contribute to a theoretical understanding of adaptive critics.
منابع مشابه
A hybrid metaheuristic using fuzzy greedy search operator for combinatorial optimization with specific reference to the travelling salesman problem
We describe a hybrid meta-heuristic algorithm for combinatorial optimization problems with a specific reference to the travelling salesman problem (TSP). The method is a combination of a genetic algorithm (GA) and greedy randomized adaptive search procedure (GRASP). A new adaptive fuzzy a greedy search operator is developed for this hybrid method. Computational experiments using a wide range of...
متن کاملGlobal convergence for evolution strategies in spherical problems: some simple proofs and difficulties
This paper presents simple proofs for the global convergence of evolution strategies in spherical problems. We investigate convergence properties for both adaptive and self-adaptive strategies. Regarding adaptive strategies, the convergence rates are computed explicitly and compared with the results obtained in the so-called “rate-of-progress” theory. Regarding self-adaptive strategies, the com...
متن کاملUsing Greedy Randomize Adaptive Search Procedure for solve the Quadratic Assignment Problem
Greedy randomize adaptive search procedure is one of the repetitive meta-heuristic to solve combinatorial problem. In this procedure, each repetition includes two, construction and local search phase. A high quality feasible primitive answer is made in construction phase and is improved in the second phase with local search. The best answer result of iterations, declare as output. In this stu...
متن کاملFitting the Three-parameter Weibull Distribution by using Greedy Randomized Adaptive Search Procedure
The Weibull distribution is widely employed in several areas of engineering because it is an extremely flexible distribution with different shapes. Moreover, it can include characteristics of several other distributions. However, successful usage of Weibull distribution depends on estimation accuracy for three parameters of scale, shape and location. This issue shifts the attentions to the requ...
متن کاملReinforcement Learning Applied to Linear Quadratic Regulation
Recent research on reinforcement learning has focused on algorithms based on the principles of Dynamic Programming (DP). One of the most promising areas of application for these algorithms is the control of dynamical systems, and some impressive results have been achieved. However, there are significant gaps between practice and theory. In particular, there are no con vergence proofs for proble...
متن کامل